-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ARM64 to lambda-promtail drone build job #5354
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right, at least I don't see the drone job building lambda-promtail-arm64
Successfully tagged 25e0d225ae133d9faf0acf560f7c6273b24f4064:latest
+ /usr/local/bin/docker tag 25e0d225ae133d9faf0acf560f7c6273b24f4064 public.ecr.aws/grafana/lambda-promtail:main-17a1a17-amd64
+ /usr/local/bin/docker tag 25e0d225ae133d9faf0acf560f7c6273b24f4064 public.ecr.aws/grafana/lambda-promtail:main
+ /usr/local/bin/docker rmi 25e0d225ae133d9faf0acf560f7c6273b24f4064
I think we need further modifications to the jsonnet code to make the lambda_promtail stages look more like the other images we're building.
Hi @cstyan, |
🤦♂️ Ah right, if you give me push access to your fork I can push a change with the new generated yaml. |
Sure, I've sent you an invitation. |
Lets see how this goes. |
Okay, I think there's more that we need to do here. The docker image that runs to build the image on builds for the default architecture in the alpine image, that needs to be changed so that if we're doing the arm64 build we crosscompile for arm64. |
I'll take a look it at. Any advice on testing it locally? Is |
Taking another look at it. The |
|
You can try running the drone pipeline locally, but I'd start with getting the dockerfile working to build an arm64 image that you confirm runs lambda-promtail as expected. |
I'll try to convince you. Compare manifests for
and
It's obvious there's ARM64 variant for
while for
Exactly the same as reported by the Drone build:
As I already stated in the PR description, I tried to actually build the ARM64 |
I'm not sure how your build locally via the dockerfile to build the binary for arm64 works, the docker image only builds lambda-promtail for the default arch on the I think we either need to modify the Dockerfile to optionally take an argument to change the Go build architecture to arm64, and have a second pipeline for that, or double check that the ECR image should work for arm64 and potentially publish another version of the image? I honestly don't know how this all currently works for the other images we build that do support arm64. |
From the Drone documentation:
I modified the I'm pretty sure the only thing needed is to build and publish the ARM64 variant of |
Hi @cstyan,
Now there's a Docker image
I update the Drone pipeline to use this image to confirm it works, but it seems the build here is blocked (https://drone.grafana.net/grafana/loki/8807). If you could unblock it, we would see if that's the missing part or not. The cross-compile way doesn't look the best to me, for couple of reasons:
Thus, in compliance with Occam's razor, I'd prefer this PR over the #5403 (granted this approarch works). |
As a sidenote, couple of days ago I create a PR to add ECR Public support to the official |
Hi @cstyan, I just pushed an effort to add the manifest part for ECR images. I tried to mirror the other parts as much as possible, changing just |
again. Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Hey @tlinhart thanks again for all your awesome work here. There was a minor issue with the volumes definition in the ecr function, I made a small change to fix that and regenerated the |
Signed-off-by: Callum Styan <[email protected]>
task Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
okay @tlinhart it looks like maybe the way we're setting up the authentication is incorrect? |
@cstyan thanks for the corrections! Seems like I forgot to define the Docker volume in the pipeline (https://docs.drone.io/pipeline/docker/syntax/volumes/temporary/). I added it to the |
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
@tlinhart it looks like we have a mismatch between the SHA used in the image tag when building and publishing the image, and the SHA used to try and find the images when creating the manifest. Unfortunately I haven't been able to track the source of that mismatch down, but I do have a few other things in progress right now. Let me know if you can find the issue, otherwise I can dig into this more later in the week. |
Hi @cstyan, I tried to simulate the process and I think the problem is only due to the fact that we try to create and push the manifest list from the fork/PR. If pushed to the
This comes from
So, I'd say that everything will work just fine once merged to |
Hi @cstyan, just a quick ping if you could take a look at my last comment. |
Hi @cstyan, is there anything I can do to make a progress? |
Hey @tlinhart, sorry for the delay! Just read over your explanation, it makes sense to me. I will run it by another person on the team who has some experience with our CI and image build setup next week just to confirm. It looks like the |
Great, thanks @cstyan! |
Hi @cstyan, could you please give it a try and confirm my assumptions? |
I think we're probably good to go. I will regenerate the drone yaml file in the morning. |
Signed-off-by: Callum Styan <[email protected]>
Signed-off-by: Callum Styan <[email protected]>
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
@tlinhart everything looking good? |
Hi @cstyan, |
What this PR does / why we need it:
Using ARM64 architecture for Lambda functions can lead to significant price/performance gain. For reference:
Up until now, the solution was to custom-build an ARM64 image for Promtail Lambda and push it to the private ECR registry. However, since the release of pull through cache repositories (https://aws.amazon.com/about-aws/whats-new/2021/11/amazon-ecr-cache-repositories/) it's now much easier to keep the images from public registries in private ECR registry up to date. This is especially true when using IaC (e.g. Pulumi).
Which issue(s) this PR fixes:
Special notes for your reviewer:
Discussed on Slack with @cstyan (https://grafana.slack.com/archives/CEPJRLQNL/p1644258844007079).
I tried to build the ARM64 image and use it in production with success.
Checklist
CHANGELOG.md
about the changes.